Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-17535 Ensure testData is good for likelySubtags #3977

Conversation

macchiati
Copy link
Member

@macchiati macchiati commented Aug 21, 2024

CLDR-17535

The ICU integration was failing, so adding a test to verify the testData we generate, then will fix errors.

The problem was in two overrides, where a source subtag was being overridden. The solution was to change 2 lines in tools/cldr-code/src/main/java/org/unicode/cldr/tool/GenerateLikelySubtags.java that were mapping 001 to US, and add a test in tools/cldr-code/src/test/java/org/unicode/cldr/unittest/LikelySubtagsTest.java that if you have

lang_script_region => lang2_script2_region2

then

  • if lang ≠ "und" then lang2 == lang
  • if script ≠ "" then script2 == script
  • if region ≠ "" then region2 == region

I also semi-removed GenerateLikelySubtagTests.java, because the name is confusing; people could think that is what is used to generate the test data. I didn't just remove it, because it is unclear whether we can do without it or not, but trying to run it will cause an exception, alerting us to that.

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@macchiati
Copy link
Member Author

After adding the test, the current failures are the following, so need to track those down.

    Error: (TestDataTest.java:348) : Maximizing und-001: expected "en-Latn-001", got "en-Latn-US"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-001: expected "en-001", got "en"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-001: expected "en-001", got "en"
    Error: (TestDataTest.java:348) : Maximizing und-Cyrl-ME: expected "ru-Cyrl-ME", got "sr-Cyrl-ME"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Cyrl-ME: expected "ru-ME", got "sr-Cyrl-ME"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Cyrl-ME: expected "ru-ME", got "sr-Cyrl-ME"
    Error: (TestDataTest.java:348) : Maximizing und-Cyrl-UZ: expected "ru-Cyrl-UZ", got "uz-Cyrl-UZ"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Cyrl-UZ: expected "ru-UZ", got "uz-Cyrl"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Cyrl-UZ: expected "ru-UZ", got "uz-Cyrl"
    Error: (TestDataTest.java:348) : Maximizing und-Hant-CN: expected "zh-Hant-CN", got "yue-Hant-CN"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Hant-CN: expected "zh-Hant-CN", got "yue-Hant-CN"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Hant-CN: expected "zh-Hant-CN", got "yue-Hant-CN"
    Error: (TestDataTest.java:348) : Maximizing und-Latn-001: expected "en-Latn-001", got "en-Latn-US"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Latn-001: expected "en-001", got "en"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Latn-001: expected "en-001", got "en"
    Error: (TestDataTest.java:348) : Maximizing und-Latn-MU: expected "mfe-Latn-MU", got "en-Latn-MU"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Latn-MU: expected "mfe", got "en-MU"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Latn-MU: expected "mfe", got "en-MU"
    Error: (TestDataTest.java:348) : Maximizing und-Latn-SL: expected "kri-Latn-SL", got "en-Latn-SL"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Latn-SL: expected "kri", got "en-SL"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Latn-SL: expected "kri", got "en-SL"
    Error: (TestDataTest.java:348) : Maximizing und-Latn-TK: expected "tkl-Latn-TK", got "en-Latn-TK"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Latn-TK: expected "tkl", got "en-TK"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Latn-TK: expected "tkl", got "en-TK"
    Error: (TestDataTest.java:348) : Maximizing und-Latn-ZM: expected "bem-Latn-ZM", got "en-Latn-ZM"
    Error: (TestDataTest.java:351) : Minimizing (favor script) und-Latn-ZM: expected "bem", got "en-ZM"
    Error: (TestDataTest.java:358) : Minimizing (favor region) und-Latn-ZM: expected "bem", got "en-ZM"

@macchiati macchiati marked this pull request as ready for review August 21, 2024 20:44
@macchiati macchiati requested a review from srl295 August 21, 2024 20:46
@macchiati
Copy link
Member Author

If the tests pass, this should fix the problem, @DraganBesevic .

@DraganBesevic
Copy link
Contributor

If the tests pass, this should fix the problem, @DraganBesevic .

I have just pushed a PR for alpha2 integration, with this problem tagged as a known issue. I will regenerate the test data for likely subtags, remove the known issue in the next round of integration and see if that works.

@macchiati macchiati merged commit 67afecd into unicode-org:main Aug 21, 2024
12 checks passed
@macchiati macchiati deleted the CLDR-17535-Ensure-testData-is-good-for-likelySubtags branch August 21, 2024 21:45
@macchiati
Copy link
Member Author

@DraganBesevic sounds great, hope it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants